Goto

Collaborating Authors

 Antwerp



MassSpecGym: A benchmark for the discovery and identification of molecules Roman Bushuiev

Neural Information Processing Systems

Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym - the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data.



The only person to win an Olympic medal and a Nobel Peace Prize

Popular Science

Philip Noel-Baker ran middle-distance races at the Olympics before dedicating his life to disarmament. In 1959, Philip Noel-Baker became the only person to ever win both an Olympic medal and Nobel Pease Prize. Breakthroughs, discoveries, and DIY tips sent six days a week. The serious son of Quaker parents, Philip Noel-Baker was first a scholar, then an Olympian, and finally a Nobel Peace Prize winner. He is the only person ever to have won both an Olympic medal and a Nobel.


Detecting Unobserved Confounders: A Kernelized Regression Approach

Chen, Yikai, Mao, Yunxin, Zheng, Chunyuan, Zou, Hao, Gu, Shanzhi, Liu, Shixuan, Shi, Yang, Yang, Wenjing, Kuang, Kun, Wang, Haotian

arXiv.org Machine Learning

Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higherorder kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.


Subgroup Discovery with the Cox Model

Izzo, Zachary, Melvin, Iain

arXiv.org Machine Learning

We study the problem of subgroup discovery for survival analysis, where the goal is to find an interpretable subset of the data on which a Cox model is highly accurate. Our work is the first to study this particular subgroup problem, for which we make several contributions. Subgroup discovery methods generally require a "quality function" in order to sift through and select the most advantageous subgroups. We first examine why existing natural choices for quality functions are insufficient to solve the subgroup discovery problem for the Cox model. To address the shortcomings of existing metrics, we introduce two technical innovations: the *expected prediction entropy (EPE)*, a novel metric for evaluating survival models which predict a hazard function; and the *conditional rank statistics (CRS)*, a statistical object which quantifies the deviation of an individual point to the distribution of survival times in an existing subgroup. We study the EPE and CRS theoretically and show that they can solve many of the problems with existing metrics. We introduce a total of eight algorithms for the Cox subgroup discovery problem. The main algorithm is able to take advantage of both the EPE and the CRS, allowing us to give theoretical correctness results for this algorithm in a well-specified setting. We evaluate all of the proposed methods empirically on both synthetic and real data. The experiments confirm our theory, showing that our contributions allow for the recovery of a ground-truth subgroup in well-specified cases, as well as leading to better model fit compared to naively fitting the Cox model to the whole dataset in practical settings. Lastly, we conduct a case study on jet engine simulation data from NASA. The discovered subgroups uncover known nonlinearities/homogeneity in the data, and which suggest design choices which have been mirrored in practice.


Topology Identification and Inference over Graphs

Mateos, Gonzalo, Shen, Yanning, Giannakis, Georgios B., Swami, Ananthram

arXiv.org Machine Learning

Topology identification and inference of processes evolving over graphs arise in timely applications involving brain, transportation, financial, power, as well as social and information networks. This chapter provides an overview of graph topology identification and statistical inference methods for multidimensional relational data. Approaches for undirected links connecting graph nodes are outlined, going all the way from correlation metrics to covariance selection, and revealing ties with smooth signal priors. To account for directional (possibly causal) relations among nodal variables and address the limitations of linear time-invariant models in handling dynamic as well as nonlinear dependencies, a principled framework is surveyed to capture these complexities through judiciously selected kernels from a prescribed dictionary. Generalizations are also described via structural equations and vector autoregressions that can exploit attributes such as low rank, sparsity, acyclicity, and smoothness to model dynamic processes over possibly time-evolving topologies. It is argued that this approach supports both batch and online learning algorithms with convergence rate guarantees, is amenable to tensor (that is, multi-way array) formulations as well as decompositions that are well-suited for multidimensional network data, and can seamlessly leverage high-order statistical information.


Google's AI Nano Banana Pro accused of generating racialised 'white saviour' visuals

The Guardian

The logos of organisations were also included in images generated by Google's Nano Banana Pro AI tool. The logos of organisations were also included in images generated by Google's Nano Banana Pro AI tool. Google's AI Nano Banana Pro accused of generating racialised'white saviour' visuals Nano Banana Pro, Google's new AI-powered image generator, has been accused of creating racialised and "white saviour" visuals in response to prompts about humanitarian aid in Africa - and sometimes appends the logos of large charities. Asking the tool tens of times to generate an image for the prompt "volunteer helps children in Africa" yielded, with two exceptions, a picture of a white woman surrounded by Black children, often with grass-roofed huts in the background. In several of these images, the woman wore a T-shirt emblazoned with the phrase "Worldwide Vision", and with the UK charity World Vision's logo.


Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance

Silue, Bram, Amaya-Corredor, Santiago, Mannion, Patrick, Willem, Lander, Libin, Pieter

arXiv.org Artificial Intelligence

Adversarial Inverse Reinforcement Learning (AIRL) has shown promise in addressing the sparse reward problem in reinforcement learning (RL) by inferring dense reward functions from expert demonstrations. However, its performance in highly complex, imperfect-information settings remains largely unexplored. To explore this gap, we evaluate AIRL in the context of Heads-Up Limit Hold'em (HULHE) poker, a domain characterized by sparse, delayed rewards and significant uncertainty. In this setting, we find that AIRL struggles to infer a sufficiently informative reward function. To overcome this limitation, we contribute Hybrid-AIRL (H-AIRL), an extension that enhances reward inference and policy learning by incorporating a supervised loss derived from expert data and a stochastic regularization mechanism. We evaluate H-AIRL on a carefully selected set of Gymnasium benchmarks and the HULHE poker setting. Additionally, we analyze the learned reward function through visualization to gain deeper insights into the learning process. Our experimental results show that H-AIRL achieves higher sample efficiency and more stable learning compared to AIRL. This highlights the benefits of incorporating supervised signals into inverse RL and establishes H-AIRL as a promising framework for tackling challenging, real-world settings.


Achieving Rotational Invariance with Bessel-Convolutional Neural Networks

Neural Information Processing Systems

As of today, Convolutional Neural Networks (CNN) are one of the most powerful tools for image analysis. They achieve, thanks to convolutions, an invariance with respect to translations.